{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/OceanBaseVectorStore.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# OceanBase Vector Store\n",
    "\n",
    "> [OceanBase Database](https://github.com/oceanbase/oceanbase) is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability. The OceanBase Database is not dependent on specific hardware architectures.\n",
    "\n",
    "This notebook describes in detail how to use the OceanBase vector store functionality in LlamaIndex."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-oceanbase\n",
    "%pip install llama-index\n",
    "# choose dashscope as embedding and llm model, your can also use default openai or other model to test\n",
    "%pip install llama-index-embeddings-dashscope\n",
    "%pip install llama-index-llms-dashscope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Deploy a standalone OceanBase server with docker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%docker run --name=ob433 -e MODE=slim -p 2881:2881 -d oceanbase/oceanbase-ce:4.3.3.0-100000142024101215"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Creating ObVecClient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyobvector import ObVecClient\n",
    "\n",
    "client = ObVecClient()\n",
    "client.perform_raw_text_sql(\n",
    "    \"ALTER SYSTEM ob_vector_memory_limit_percentage = 30\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Config dashscope embedding model and LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set Embbeding model\n",
    "import os\n",
    "from llama_index.core import Settings\n",
    "from llama_index.embeddings.dashscope import DashScopeEmbedding\n",
    "\n",
    "# Global Settings\n",
    "Settings.embed_model = DashScopeEmbedding()\n",
    "\n",
    "# config llm model\n",
    "from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels\n",
    "\n",
    "dashscope_llm = DashScope(\n",
    "    model_name=DashScopeGenerationModels.QWEN_MAX,\n",
    "    api_key=os.environ.get(\"DASHSCOPE_API_KEY\", \"\"),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import (\n",
    "    SimpleDirectoryReader,\n",
    "    load_index_from_storage,\n",
    "    VectorStoreIndex,\n",
    "    StorageContext,\n",
    ")\n",
    "from llama_index.vector_stores.oceanbase import OceanBaseVectorStore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download Data & Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oceanbase = OceanBaseVectorStore(\n",
    "    client=client,\n",
    "    dim=1536,\n",
    "    drop_old=True,\n",
    "    normalize=True,\n",
    ")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=oceanbase)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Query Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Growing up, the author worked on two main activities outside of school: writing and programming. They wrote short stories, which they admits were not particularly good, lacking plot but containing characters with strong emotions. They also started programming at a young age, initially on an IBM 1401 computer using an early version of Fortran, though they found it challenging due to the limitations of punch card input and their lack of data to process. Their programming journey真正 took off when microcomputers became available, allowing them to write more interactive programs such as games, a rocket flight predictor, and a simple word processor.'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set Logging to DEBUG for more detailed outputs\n",
    "query_engine = index.as_query_engine(llm=dashscope_llm)\n",
    "res = query_engine.query(\"What did the author do growing up?\")\n",
    "res.response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Metadata Filtering\n",
    "\n",
    "OceanBase Vector Store supports metadata filtering in the form of `=`、 `>`、`<`、`!=`、`>=`、`<=`、`in`、`not in`、`like`、`IS NULL` at query time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Empty Response'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.core.vector_stores import (\n",
    "    MetadataFilters,\n",
    "    MetadataFilter,\n",
    ")\n",
    "\n",
    "query_engine = index.as_query_engine(\n",
    "    llm=dashscope_llm,\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            MetadataFilter(key=\"book\", value=\"paul_graham\", operator=\"!=\"),\n",
    "        ]\n",
    "    ),\n",
    "    similarity_top_k=10,\n",
    ")\n",
    "\n",
    "res = query_engine.query(\"What did the author learn?\")\n",
    "res.response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Empty Response'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "oceanbase.delete(documents[0].doc_id)\n",
    "\n",
    "query_engine = index.as_query_engine(llm=dashscope_llm)\n",
    "res = query_engine.query(\"What did the author do growing up?\")\n",
    "res.response"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index-0y_Jm_SY-py3.12",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
